首页> 外文OA文献 >Orthogononalization on a general purpose graphics processing unit with double double and quad double arithmetic
【2h】

Orthogononalization on a general purpose graphics processing unit with double double and quad double arithmetic

机译:带有通用图形处理单元的正交化   双倍和四倍双算术

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Our problem is to accurately solve linear systems on a general purposegraphics processing unit with double double and quad double arithmetic. Thelinear systems originate from the application of Newton's method on polynomialsystems. Newton's method is applied as a corrector in a path following method,so the linear systems are solved in sequence and not simultaneously. Onesolution path may require the solution of thousands of linear systems. Inprevious work we reported good speedups with our implementation to evaluate anddifferentiate polynomial systems on the NVIDIA Tesla C2050. Although the costof evaluation and differentiation often dominates the cost of linear systemsolving in Newton's method, because of the limited bandwidth of thecommunication between CPU and GPU, we cannot afford to send the linear systemto the CPU for solving during path tracking. Because of large degrees, the Jacobian matrix may contain extreme values,requiring extended precision, leading to a significant overhead. This overheadof multiprecision arithmetic is our main motivation to develop a massivelyparallel algorithm. To allow overdetermined linear systems we solve linearsystems in the least squares sense, computing the QR decomposition of thematrix by the modified Gram-Schmidt algorithm. We describe our implementationof the modified Gram-Schmidt orthogonalization method for the NVIDIA TeslaC2050, using double double and quad double arithmetic. Our experimental resultsshow that the achieved speedups are sufficiently high to compensate for theoverhead of one extra level of precision.
机译:我们的问题是使用double double和quad double算术在通用图形处理单元上精确求解线性系统。线性系统源自牛顿法在多项式系统上的应用。牛顿法在路径跟随法中被用作校正器,因此线性系统是按顺序而不是同时求解的。一个解决方案路径可能需要数千个线性系统的解决方案。在以前的工作中,我们报告了在NVIDIA Tesla C2050上实施以评估和区分多项式系统的良好速度。尽管在牛顿方法中评估和微分的成本通常占线性系统解决方案的成本,但是由于CPU和GPU之间的通信带宽有限,我们无法负担将线性系统发送给CPU进行路径跟踪时求解的费用。由于度数较大,雅可比矩阵可能包含极值,需要扩展精度,从而导致大量开销。这种多精度算法的开销是我们开发大规模并行算法的主要动机。为了允许超定线性系统,我们在最小二乘意义上求解线性系统,通过改进的Gram-Schmidt算法计算矩阵的QR分解。我们使用double double和quad double算法描述了针对NVIDIA TeslaC2050的改进的Gram-Schmidt正交化方法的实现。我们的实验结果表明,所达到的加速比足够高,可以补偿额外一级精度的开销。

著录项

  • 作者

    Verschelde, Jan; Yoffe, Genady;

  • 作者单位
  • 年度 2013
  • 总页数
  • 原文格式 PDF
  • 正文语种 {"code":"en","name":"English","id":9}
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号